Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add prometheus.relabel external cache proposal #1600

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

wilfriedroset
Copy link

PR Description

This PR add a proposal to introduce the support for an external cache for prometheus.relabel component.

Which issue(s) this PR fixes

N/A

Notes to the Reviewers

@wilfriedroset wilfriedroset force-pushed the prometheus-relabel-external-cache branch from 92c70da to 9f093a1 Compare September 3, 2024 13:20
forward_to = [...]
cache {
backend = "redis"
redis {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a bespoke redis/memcached action, how about a redis and memcached component that exports a key value interface? Along with generic accessors. I could see this being used for more than just relabel. Much in the same way we allow components to consume strings but those strings can come from vault, file, http etc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storage.redis "relabel" {
  url = "example.com"
}

prometheus.relabel "relabel" {
  cache = storage.redis.relabel.cache
}

// Just for fun
prometheus.remote_write "mimir" {
  endpoint {
    external_labels = {
      cluster = coalesce(storage.redis.relabel["cluster"],"unknown")
    }
  }
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could even get clever and allow prometheus.relabel to have a list of caches, if there are problems with redis then fallback to in memory.

prometheus.relabel "relabel" {
  caches = [storage.redis.relabel.cache,storage.memory.relabel]
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitively like the idea of having a unified way to define the caching mecanism.

if there are problems with redis then fallback to in memory.

You are correct but this add an extra complexity to take into account. A highly available external cache is indeed recommended but having a fallback is definitively a good idea.

The extra complexity comes from when the external caches becomes unavailable Alloy needs to use the in-memory one. This could lead to OOM, e.g: you have plan to use an external cache then you fallback on in-memory but the limits of your pod is not high enough.
Then Alloy needs to check if the external storage is available again. Once it is, Alloy needs to move its in-memory k/v to the external cache. But we are exposed to a race when we have two Alloy instances using the same cache.

That being said, as I'm not that familiar with the code base yet and your proposal seems harder to achieve.

What do you think about extending our proposal to loki.relabel and standardize the whole thing as prometheus and loki are the main component using cache?
This would serve as a first step toward your proposal. Something like so

type RelabelCache interface {
    Get(key uint64) (*labelAndID, error)
    Set(key uint64, value *labelAndID) error
    Remove(key uint64)
    Clear(newSize int) error
    GetCacheSize() int
} 

Once done, the associated work to create the storage components such as storage.redis could be picked up right after the initial proposal.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fallback caches are likely a stretch though I do feel the kv store should be a generic component. As far as the interface I would go with something like the below. Disclaimer this is getting into the deeper technical ideas. The reason keys might want to be strings is to mirror how alloy uses maps. Where the key is always a string.

type Cache interface {
    Get(key string) (any, bool) 
    Set(key string, value any, ttl int) 
    Remove(key uint64)
    Clear(newSize int) error
    GetCacheSize() int
} 

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't have to specifically use that for the first round. Capsules are mainly if you want to bridge into it being accessible within the alloy config. In this case your likely passing an interface/pointer from go code to go code, which should work without problem.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the proposal to take reflect what we have been discussing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think we will likely need GetMultiple(keys []string) ([]Results) and SetMultiple, imagine a scenario where you are scraping a million records, hitting redis each time is going to be pretty expensive at one request for each.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattdurham I've tried to take your feedbacks into account, the PR should be fine what do you think about it?
We have started to work on the code with @pbailhache, what would be the next step regarding this proposal? should we merge it if it is to your liking and iterate over the code?

p.s: I don't know what to do about the build 😅

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just opened a draft pull request for the feature :
#1692

This is still a work in progress and I'm ready to iterate over it.

@mattdurham mattdurham self-assigned this Sep 3, 2024
@wilfriedroset wilfriedroset requested a review from a team as a code owner September 7, 2024 21:15
@wilfriedroset wilfriedroset force-pushed the prometheus-relabel-external-cache branch from b1f8f74 to 8d68d6b Compare September 12, 2024 13:12

## Compatibility

This proposal offer to deprecate for a couple of release the old way to configure the in-memory cache, then release+2 drop the old way. Doing so allow to migrate the settings to the correct block.
Copy link
Member

@tpaschalis tpaschalis Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think our accepted backwards compatibility guarantees go against this. Any breaking changes for stable functionality would have to wait until v2.0.

Is there any reason for wanting to deprecate it? We can't control the pace of users adopting new versions, so IMO doing our best to avoid breaking their configurations and use cases would be the way to go.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to deprecate anything sooner than what is guaranteed.
I will rephrase to say that we keep both compatibility until the next major release

Copy link
Contributor

This PR has not had any activity in the past 30 days, so the needs-attention label has been added to it.
If you do not have enough time to follow up on this PR or you think it's no longer relevant, consider closing it.
The needs-attention label signals to maintainers that something has fallen through the cracks. No action is needed by you; your PR will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity.
Thank you for your contributions!

@wilfriedroset
Copy link
Author

This is not inactive as we are iterating on an implementation in a distinct PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants